Before you dismiss these ramblings as nitpicking, let me pose a few questions:
- If you pay for a gallon of gas, would you be content if the pump delivered 3.6 quarts instead of 4.0?
- If someone promises to meet you in an hour, are they "on-time" if they arrive in an hour and 15 minutes?
- If you buy a dozen eggs, do you expect there to be twelve of them?
In all of these areas, we find it reasonable to expect precision in our understanding of each other (even if, as in the example using time, we allow for imprecision in actual performance). This seems to me a quite reasonable expectation.
And yet, for decades, we've accepted in common usage an ambiguity in our descriptions of the memory and disk capacities of computers. And while the discrepancy started small, as disk and memory capacities have grown the discrepancy is growing ever larger on a percentage basis of the sizes involved. And yet, there is an alternative terminology that we could be using for greater precision.
It started innocently enough. Computers are binary devices and computer resources are addressed in binary units. Early computer documentation explicitly specified the (very small) memory and disk size with numbers like 1024, 2048, 4096, 8192 and so on. But as the available sizes grew, our impulse for brevity in speech and writing led to a desire to round these off to larger units. We already were familiar with the metric system prefixes -- kilo for thousand (10^3), mega for million (10^6), and giga for billion (10^9) -- and it was noted that those units were "pretty close" to some units derived from binary units like 1024 (2^10). So it became common to use kilo (sometimes) for 1024 units of computer related things, while it was also used for 1000 units (the strict definition of the prefix). So ambiguity was accepted for convenience -- after all, there's less than 2.5% difference between 1000, and 1024, right?
Some parts of our industry used this discrepancy to their advantage, and to the confusion of their customers. For example, disk drive manufacturers prefer to report the size of their drives in decimal units (because it makes their drives sound larger) despite the fact that virtually all software will report disk sizes in binary units (making the reported size of the drive seem smaller than advertised on the box).
The extent of the ambiguity is considerable. Hard disk manufacturers use the prefixes to reflect decimal values. Operating systems report hard disk sizes using binary values. Flash memory uses decimal values. RAM uses binary values. CD-ROMs are measured in binary values while DVDs are measured in decimal values. And diskette drives (if you can still find one) are measured in a strange hybrid where 1 "megabyte" means 1024 x 1000 bytes. Clock rates, data transfer rates, and network communication speeds are all generally measured in decimal values.
But the larger the capacities we have, the larger (as a percentage) the discrepancy grows. Consider:
A kilobyte:
- Defined as 10^3 is 1,000 bytes
- Defined as 2^10 is 1,024 bytes
- This is a 2.40% discrepancy
A megabyte
- Defined as 10^6 is 1,000,000 bytes
- Defined as 2^20 is 1,048,576 bytes
- This is a 4.86% discrepancy
A gigabyte
- Defined as 10^9 is 1,000,000,000 bytes
- Defined as 2^30 is 1,073,741,824 bytes
- This is a 7.37% discrepancy
A terabyte
- Defined as 10^12 is 1,000,000,000,000 bytes
- Defined as 2^40 is 1,099,511,627,776 bytes
- This is a 9.95% discrepancy
A petabyte
- Defined as 10^15 is 1,000,000,000,000,000 bytes
- Defined as 2^50 is 1,125,899,906,842,620 bytes
- This is a 12.59% discrepancy
An exabyte
- Defined as 10^18 is 1,000,000,000,000,000,000 bytes
- Defined as 2^60 is 1,152,921,504,606,850,000 bytes
- This is a 15.29% discrepancy
A 2.5% difference between what you mean by a term and what I think you mean by a term may be fairly trivial. But a 10%, 12% or 15% difference is not so readily dismissed.
And there is an alternative.
As long ago as 1968, some computer scientists began proposing a separate set of prefixes that refer exclusively to the binary definitions. And as long ago as 1998 a number of standards bodies and trade organizations approved a proposal for and recommended adoption of a new naming convention for these units. The new names for the binary units are based on the decimal names, but replace the second syllable of the corresponding decimal name with "bi" for "binary". So 1000 bytes is a kilobyte but 1024 bytes is a "kibibyte". Under the new (now 14 year-old) proposals, each of the following units would have a unique and unambiguous meaning:
| Decimal Prefixes | Binary Prefixes |
| Name | Definition | Name | Definition |
| kilo |
1000 or 10^3 |
kibi |
1024 or 2^10 |
| mega |
1000^2 or 10^6 |
mebi |
1024^2 or 2^20 |
| giga |
1000^3 or 10^9 |
gibi |
1024^3 or 2^30 |
| tera |
1000^4 or 10^12 |
tebi |
1024^4 or 2^40 |
By now (2012) many organizations recommended the use of these conventions (or at least insist that the traditional prefixes refer only to the decimal values), including:
- The International Electrotechnical Commission (IEC)
- The U.S. National Institute of Standards and Technology (NIST)
- The IEEE
- The International Bureau of Weights and Measures (BIPM)
- The Society of Automotive Engineers (SAE)
- The European Committee for Electrotechnical Standardization (CENELEC)
I don't know about you, but I'm sold on this. Going forward, I'll be more precise about the way I describe memory and disk space. I hope you'll join me.
A few links for further reading:
http://physics.nist.gov/cuu/Units/binary.html
http://en.wikipedia.org/wiki/Binary_prefix
http://en.wikipedia.org/wiki/International_System_of_Units
http://lpar.ath0.com/2008/07/15/si-unit-prefixes-a-plea-for-sanity/
http://members.optus.net/alexey/prefBin.xhtml